1 4 Fe b 20 14 Authorship Analysis based on Data Compression
نویسندگان
چکیده
6 This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method is applied to documents which are heterogeneous in style, written in five different languages and coming from different historical periods. Results are comparable to the state of the art and outperform traditional compression-based methods.
منابع مشابه
Authorship analysis based on data compression
6 This paper proposes to perform authorship analysis using the Fast Compression Distance (FCD), a similarity measure based on compression with dictionaries directly extracted from the written texts. The FCD computes a similarity between two documents through an effective binary search on the intersection set between the two related dictionaries. In the reported experiments the proposed method i...
متن کاملAuthorship Attribution based on Data Compression for Telugu Text
Authorship attribution (AA) can be defined as the task of inferring characteristics of a document's author from the textual characteristics of the document itself. In this paper we evaluated the compression model for AA on Telugu text. We considered six different compressors namely Zip, BZip, GZip, LZW, PPM and PPMd in combination with three different compression distance measures such as ...
متن کامل2 4 Fe b 20 04 Dictionary based methods for information extraction
In this paper we present a general method for information extraction that exploits the features of data compression techniques. We first define and focus our attention on the so-called dictionary of a sequence. Dictionaries are intrinsically interesting and a study of their features can be of great usefulness to investigate the properties of the sequences they have been extracted from (e.g. DNA...
متن کاملImplementation of VlSI Based Image Compression Approach on Reconfigurable Computing System - A Survey
Image data require huge amounts of disk space and large bandwidths for transmission. Hence, imagecompression is necessary to reduce the amount of data required to represent a digital image. Thereforean efficient technique for image compression is highly pushed to demand. Although, lots of compressiontechniques are available, but the technique which is faster, memory efficient and simple, surely...
متن کاملMössbauer Spectroscopy of Mineral Separates from Snc Meteorites
Introduction: Numerous workers have recently focused attention on the issue of the oxygen fugacity (fO2) of martian samples [1,2,3,4,5]. Estimates of fO2 based on Fe-Ti oxides [6] and DEu/DGd and DEu/DSm ratios [3,4,7] suggest a range of fO2 values for SNC meteorites from IW+2.5 IW+3.5 for Shergotty to IW2.0 IW+0.2 for QUE94201 [3,4]. Fe/Fe is also a function of fO2, and synchrotron micro-XANES...
متن کامل